Skip to content

Fix extended-query isolation for pooled postgres.js clients in pglite-socket#977

Open
sorenbs wants to merge 1 commit intoelectric-sql:mainfrom
sorenbs:fix/pglite-socket-ready-for-query-isolation
Open

Fix extended-query isolation for pooled postgres.js clients in pglite-socket#977
sorenbs wants to merge 1 commit intoelectric-sql:mainfrom
sorenbs:fix/pglite-socket-ready-for-query-isolation

Conversation

@sorenbs
Copy link
Copy Markdown

@sorenbs sorenbs commented Apr 22, 2026

Summary

This fixes a protocol-isolation bug in packages/pglite-socket that shows up when a postgres.js pool uses multiple concurrent connections against PGLiteSocketServer.

The core problem was that the socket server queued and scheduled individual frontend protocol messages globally across handlers, but only pinned handler ownership while db.isInTransaction() was true. That is not enough for the PostgreSQL extended query protocol, because unnamed prepared-statement state lives until the backend reaches ReadyForQuery, not only while SQL transaction state is open.

In practice, message sequences like Parse / Bind / Execute / Sync from different logical clients could interleave against the same backend session state. That produced errors like:

  • PostgresError: unnamed prepared statement does not exist
  • code: 26000
  • routine: exec_bind_message

How we hit this

I ran into this while working on a Prisma Dev / Durable Streams demo that used a PGlite-backed Postgres runtime behind pglite-socket.

When opening Prisma Studio against that runtime, the first load would sometimes show a red introspect error and then succeed on refresh. Prisma Studio uses postgres.js, and its initial page load issues concurrent metadata/introspection-style queries. One of the concurrent queries is a timezone read:

select current_setting('timezone') as timezone

Against pglite-socket, that concurrent startup pattern could fail even though the same queries worked:

  • sequentially
  • or with a client pool forced to max: 1

That suggested the bug was below Prisma Dev and below the WAL/Streams layer.

Reproduction

I isolated this down to plain PGLiteSocketServer, with no Prisma Dev and no WAL stream involved.

Studio-shaped repro observed during investigation

The failure pattern that matched Prisma Studio was:

  1. Start PGLiteSocketServer({ maxConnections: 10 })
  2. Open postgres(url, { max: 10 })
  3. Run concurrently from the same pool:
    • a metadata/catalog query
    • select current_setting('timezone') as timezone
  4. Observe 26000 / exec_bind_message

Regression test added in this PR

For a stable package test, this PR adds a smaller repro that isolates the same protocol bug without depending on Prisma internals:

  • sql.unsafe('select $1::int as value', [i])
  • sql.unsafe("select current_setting('timezone') as timezone", [])

run concurrently through a postgres.js pool with max: 10.

Before this fix, that failed reliably in local verification. After this fix, it passes consistently.

Root cause

QueryQueueManager was effectively serializing individual frontend messages, not whole extended-query exchanges.

That meant handler A could send Parse and then handler B could get scheduled before handler A reached Sync / ReadyForQuery. Because the backend session is shared, the unnamed statement/portal state could be overwritten or cleared before handler A's later Bind / Execute, which explains the unnamed prepared statement does not exist failure.

Fix

This change introduces handler ownership at the protocol level:

  • track an activeHandlerId
  • once a handler starts an extended-protocol exchange, keep scheduling its queued protocol messages ahead of other handlers
  • parse backend responses and retain ownership until the backend emits ReadyForQuery
  • release ownership when ReadyForQuery returns to idle (I)
  • also release ownership on Terminate packets so shutdown/connection close does not strand other queued work

This keeps logical client state isolated even though the underlying backend session is shared.

Verification

Ran:

pnpm --filter @electric-sql/pglite-socket exec vitest tests/query-with-postgres-js-concurrency.test.ts
pnpm --filter @electric-sql/pglite-socket exec vitest

Result locally:

  • new regression test passes
  • full pglite-socket suite passes (60 tests)

Files

  • packages/pglite-socket/src/index.ts
  • packages/pglite-socket/tests/query-with-postgres-js-concurrency.test.ts

If helpful, I can also add a second regression that uses a more Studio-shaped catalog query pair, but I kept the committed test minimal and protocol-focused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant